-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to change error constructor #445
base: master
Are you sure you want to change the base?
Add option to change error constructor #445
Conversation
I'm not sure |
Hi @mysteriouslyseeing, thanks for your PR! What about using a similar syntax to
? E.g., with the new syntax: #[logo(error(Error, callback))] |
`#[logos(error = SomeType)]` `#[logos(error_callback = callback)]` is now expressed as `#[logos(error(SomeType, callback))]`
book/src/attributes/logos.md
Outdated
@@ -34,7 +34,8 @@ The type `ErrorType` can be any type that implements `Clone`, `PartialEq`, | |||
`Default` and `From<E>` for each callback's error type. | |||
|
|||
`ErrorType` must implement the `Default` trait because invalid tokens, i.e., | |||
literals that do not match any variant, will produce `Err(ErrorType::default())`. | |||
literals that do not match any variant, will produce `Err(ErrorType::default())`, | |||
unless you provide a callback with the alternate syntax `#[logos(error(ErrorType, callback = ...))]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. Just one last comment: can you add a bit more details about the callback (signature) and maybe link to the example you wrote? Otherwise, users will not know it is documented :-)
Otherwise, it's all good to be merged!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually just changed the custom_error example to use the new error syntax instead of adding a new one, and the example is already linked in the book
Just a heads-up that adding a defaulted method to a trait (which this PR does) is listed as a 'possibly-breaking' change in the cargo book's SemVer section. |
Good point, meaning that it should be released under a version that either bumps major or minor part. |
According to CI's benchmarks, this PR results in a notable decrease in performance, but I don't really see why: https://github.com/maciejhirsz/logos/actions/runs/12214084915?pr=445. |
I ran the benchmarks on my machine and got this:
Looks like both strings tests are definitely slower given both the benchmarks. I've got no idea why. |
Looks like one of the test also increased significantly, that's also weird. I send an e-mail to Logos' author to see if he can enable CodSpeed, which might provide more accurate benchmarks. |
The problem is on lines 381 and 385 of src/lexer.rs: #[cfg(not(feature = "forbid_unsafe"))]
{
self.token = core::mem::ManuallyDrop::new(Some(Err(Token::make_error(self))));
}
#[cfg(feature = "forbid_unsafe")]
{
self.token = Some(Err(Token::make_error(self)));
} Previously, Maybe the problem is that the compiler can't optimise that like it used to? Setting self.token and creating the error both require references to self now - but surely it should be able to see that the reference isn't actually used? |
Here are my benchmarks after reverting that specific line:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your analysis @mysteriouslyseeing! Let's see if hinting the compiler that it should inline the default implementation can help, see my suggestion.
src/lib.rs
Outdated
fn make_error(_lexer: &mut Lexer<'source, Self>) -> Self::Error { | ||
Self::Error::default() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn make_error(_lexer: &mut Lexer<'source, Self>) -> Self::Error { | |
Self::Error::default() | |
} | |
#[inline(always)] | |
fn make_error(_lexer: &mut Lexer<'source, Self>) -> Self::Error { | |
Self::Error::default() | |
} |
We can probably try this to see if that helps the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance is unchanged, unfortunately:
group default_before default_changes
----- -------------- ---------------
count_ok/identifiers 1.20 620.0±18.12ns 1198.2 MB/sec 1.00 518.1±28.66ns 1433.8 MB/sec
count_ok/keywords_operators_and_punctators 1.12 1731.6±95.53ns 1173.6 MB/sec 1.00 1552.2±150.95ns 1309.3 MB/sec
count_ok/strings 1.00 410.6±6.04ns 2023.2 MB/sec 1.40 574.0±12.94ns 1447.1 MB/sec
iterate/identifiers 1.15 594.5±30.62ns 1249.7 MB/sec 1.00 518.0±27.87ns 1434.3 MB/sec
iterate/keywords_operators_and_punctators 1.04 1623.9±49.04ns 1251.5 MB/sec 1.00 1559.1±111.50ns 1303.5 MB/sec
iterate/strings 1.00 412.3±32.70ns 2014.8 MB/sec 1.39 571.8±16.91ns 1452.7 MB/sec
To get it to work, I changed the signature and default implementation of #[inline(always)]
#[doc(hidden)]
fn make_error(lexer: &mut Lexer<'source, Self>) {
use internal::LexerInternal as _;
lexer.set_error(Self::Error::default())
} This uses a new function on #[inline]
fn set_error(&mut self, error: Token::Error) {
#[cfg(not(feature = "forbid_unsafe"))]
{
self.token = core::mem::ManuallyDrop::new(Some(Err(error)));
}
#[cfg(feature = "forbid_unsafe")]
{
self.token = Some(Err(error));
}
} Now the #[inline]
fn error(&mut self) {
self.token_end = self.source.find_boundary(self.token_end);
Token::make_error(self);
} Mixing around the functions like this does make it harder to understand but it does fix the performance regression:
And here is the comparison between the old and new changes:
It would be nice to know how |
Replaced usages with `set`
Thanks for your efforts @mysteriouslyseeing! I am not sure to understand your two benchmark results: the first is this PR's changes, and the second is the change between last commit and second-last commit? |
Exactly, yes |
Hum, benchmarks results are, surprising, as they seem to both indicate performance increase and decrease: https://github.com/maciejhirsz/logos/actions/runs/12279903326/attempts/2#summary-34310133559. |
Hmmm, that is strange. Looks like it's mostly increases? Identifiers are worse, but only without |
I'll hold this for one or two more weeks, to see if @maciejhirsz can enable CodSpeed (it should provide better instruments for precise benchmarking). After that, I will try to run benchmark locally and see if it is safe to merge :-) Sorry for the delay! |
Closes #444.